Skip to content

Arm backend: Modernise and standalone the executor runner#19018

Open
usamahz wants to merge 15 commits into
pytorch:mainfrom
usamahz:feature/standalone-runner
Open

Arm backend: Modernise and standalone the executor runner#19018
usamahz wants to merge 15 commits into
pytorch:mainfrom
usamahz:feature/standalone-runner

Conversation

@usamahz
Copy link
Copy Markdown
Collaborator

@usamahz usamahz commented Apr 21, 2026

Summary

This PR modernizes the ExecuTorch Arm bare-metal runner workflow so users can move from a PyTorch model to a runnable
Arm executor runner with fewer manual build-system steps, stronger validation, and faster repeated local iteration.

The main change is a new standalone Arm executor runner CMake entry point. run.sh now acts as the orchestration
layer for common Ethos-U bare-metal flows: it can derive build directories, configure the standalone runner with Arm
bare-metal defaults, stage generated PTE/BPTE files, validate reused CMake caches, build the needed runner target,
locate the runner binary, and invoke FVP.

Problem

Before this change, the Arm runner workflow depended on manually stitching together ExecuTorch build/install
artifacts, runner CMake configuration, PTE input wiring, toolchain and target settings, optional debug features, and
repeated install/export steps.

That made the workflow harder to explain, fragile in CI, slower to iterate on locally, and easy to break when reusing
a build directory configured for a different target or feature set.

And a shorter version if the PR description is already long:

CMake Architecture Change

flowchart LR
    subgraph Before
        A1["Build ExecuTorch<br/>arm-baremetal preset"] --> A2["Install/export artifacts"]
        A2 --> A3["Configure runner CMake<br/>examples/arm/executor_runner"]
        A4["PTE / BPTE"] --> A3
        A3 --> A5["arm_executor_runner ELF"]
    end

    subgraph After
        B1["run.sh"] --> B2["Validate / choose build dir"]
        B2 --> B3["Standalone runner CMake<br/>examples/arm/executor_runner/standalone"]
        B4["PTE / BPTE"] --> B1
        B3 --> B5["ExecuTorch top-level CMake<br/>as subdirectory"]
        B3 --> B6["Arm CMake helpers + presets"]
        B5 --> B7["arm_executor_runner ELF"]
        B6 --> B7
    end
Loading

What Changed

  • Added examples/arm/executor_runner/standalone as the supported standalone CMake entry point for
    arm_executor_runner.
  • Added shared Arm CMake helpers for Ethos-U SDK setup, required target validation, and predictable runner output
    paths.
  • Updated build_executor_runner.sh and run.sh to use the standalone runner workflow.
  • Added deterministic default build directories under --et_build_root.
  • Added cache validation for reused build directories, including target, toolchain, selected ops, PTE placement,
    BundleIO, ETDump, and devtools settings.
  • Added PTE/BPTE staging so repeated runs can reuse the same configured CMake build directory.
  • Integrated selective-op handling into the standalone runner path.
  • Cleaned up bare-metal install/export behavior so standalone builds can consume reusable build-tree artifacts.
  • Updated Arm README and notebooks for the new workflow.

Iteration Speed

Repeated local PTE-to-runner iteration is now 8x faster because run.sh can reuse the configured standalone CMake build directory, stage updated PTE/BPTE payloads into the existing cache wiring, and rebuild only the needed runner target instead of repeating the full manual configure/install/export flow.

This is a developer workflow speedup, not a model runtime speedup.

Result

For common Ethos-U bare-metal usage, the user-facing path is now script-owned and repeatable:

  1. Run Arm setup.
  2. Run examples/arm/run.sh with a model and target.
  3. Reuse or inspect the generated build directory under --et_build_root.
  4. Iterate by regenerating the PTE/BPTE and rebuilding through the same validated CMake cache.

VGF host flows remain explicit: run.sh requires an existing --build-dir for VGF-style host builds rather than
auto-configuring them as bare-metal runner builds.

Testing

Validated through the Arm backend runner, bare-metal, VGF, and CI workflows covered by this stack.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19018

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 6 Pending

As of commit 74a3d1b with merge base 65c7ee2 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 21, 2026
@usamahz usamahz marked this pull request as draft April 21, 2026 12:31
@usamahz
Copy link
Copy Markdown
Collaborator Author

usamahz commented Apr 21, 2026

I will merge after all 8 commits land in this PR - so until then please do not merge! :)

@usamahz usamahz force-pushed the feature/standalone-runner branch from 10dc5c9 to 9127071 Compare May 8, 2026 14:34
@usamahz usamahz marked this pull request as ready for review May 8, 2026 14:40
@usamahz usamahz marked this pull request as draft May 8, 2026 14:41
@usamahz usamahz marked this pull request as ready for review May 11, 2026 08:17
@usamahz usamahz added the release notes: arm Changes to the ARM backend delegate label May 11, 2026
@usamahz usamahz force-pushed the feature/standalone-runner branch 2 times, most recently from 423ca2e to b4e729e Compare May 12, 2026 10:53
@usamahz usamahz requested a review from zingo May 12, 2026 11:02
@usamahz usamahz requested a review from mergennachin as a code owner May 12, 2026 11:03
@usamahz
Copy link
Copy Markdown
Collaborator Author

usamahz commented May 12, 2026

This PR is ready to review

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented May 12, 2026

Hi @digantdesai this is large and a lot of cmake changes and replaces a PR you reviewed yesterday in a better way)is OK to merge this?
We don't thing it should need any buck2 changes.

Adding @GregoryComer also, as you where added the the old PR .

@zingo zingo added the help wanted Extra attention is needed label May 12, 2026
Copy link
Copy Markdown
Member

@GregoryComer GregoryComer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I'll defer to @digantdesai, but I believe there should be no buck impact and you should be clear to merge.


set(CMAKE_INSTALL_PREFIX "${CMAKE_BINARY_DIR}")
define_overridable_option(
EXECUTORCH_BAREMETAL_SKIP_INSTALL
Copy link
Copy Markdown
Contributor

@digantdesai digantdesai May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit why do negative name i.e. SKIP as opposed to no-SKIP but set to OFF? Also we have EXECUTORCH_BUILD_ARM_BAREMETAL

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I named it around the special-case behavior: the normal bare-metal flow installs artifacts, while the standalone runner opts out. But a positive option is clearer and matches EXECUTORCH_BUILD_ARM_BAREMETAL better. I’ll rename it to something like EXECUTORCH_ARM_BAREMETAL_INSTALL_ARTIFACTS, default ON, and set it OFF for the standalone runner path, in a future commit

Comment thread examples/arm/run.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is getting too large, we should either split it up or move to python :p

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should actually move to Python, Thanks, that's a nice idea!

@digantdesai
Copy link
Copy Markdown
Contributor

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @digantdesai's task in 2m 33s —— View job


PR Review: Arm backend standalone runner CMake

  • Read and analyze changed files
  • Review CMake architecture (standalone entry point, helpers)
  • Review shell script changes (run.sh, build_executor_runner.sh, fvp_utils.sh)
  • Review test changes
  • Check for issues (security, correctness, maintainability)
  • Post comprehensive review

Overall Assessment

This is a well-structured PR that introduces a standalone CMake entry point for the Arm bare-metal runner, moving orchestration into run.sh. The CMake architecture is sound — the selective-ops variable save/restore in standalone/CMakeLists.txt is particularly clever, and the cache validation in run.sh prevents subtle mis-configuration bugs. The shared CMake helpers (ArmEthosUSDK.cmake, ArmRunnerUtils.cmake) are clean and focused.

I agree with @digantdesai's comment that run.sh is getting large (~850 lines). That's not a blocker for this PR, but it's worth tracking.


Issues & Suggestions

1. Shell word-splitting risk in extra_build_flags handling

examples/arm/run.sh:349-351 and examples/arm/run.sh:671-672:

# shellcheck disable=SC2206
local extra_args=(${extra_build_flags})
cmake_cmd+=("${extra_args[@]}")

The SC2206 disable acknowledges the issue but doesn't resolve it. If extra_build_flags contains values with spaces (e.g., -DFOO="some value"), word splitting will break them. This is pre-existing behavior from build_executor_runner.sh, but it's now duplicated in two places within run.sh.

Consider documenting the limitation (no spaces in flag values) or using eval / read -ra for proper quoting.

2. EXECUTORCH_BAREMETAL_SKIP_INSTALL preset logic is confusing (echoing @digantdesai's review comment)

tools/cmake/preset/arm_baremetal.cmake:6-9:

define_overridable_option(
  EXECUTORCH_BAREMETAL_SKIP_INSTALL
  "Skip emitting install/export rules when building bare-metal artifacts" BOOL
  ON
)

The name SKIP_INSTALL defaulting to ON but then the standalone runner overrides it to OFF (-DEXECUTORCH_BAREMETAL_SKIP_INSTALL=OFF) creates a double-negative mental model. The interaction between this variable, CMAKE_SKIP_INSTALL_RULES, and the guard in the top-level CMakeLists.txt:166-177 is non-trivial:

  • Preset sets SKIP_INSTALL=ONCMAKE_SKIP_INSTALL_RULES is forced OFF (confusingly)
  • Standalone runner sets SKIP_INSTALL=OFF → top-level CMakeLists sets CMAKE_SKIP_INSTALL_RULES=ON

This works but is hard to reason about. A positive name like EXECUTORCH_BUILD_ARM_INSTALL_RULES (ON by default for the runner, OFF by default in the preset) would be clearer. Not a blocker, but worth considering for future maintainability.

3. configure_ethosu_scratch_if_requested re-runs cmake without all original flags

examples/arm/run.sh:656-678: When --specify_ethosu_scratch is used, the function re-invokes cmake -S ... -B ... with only extra_build_flags + the scratch size flag. CMake will reuse its cache for other values, but if the initial configure failed or the cache is partially populated, this could produce unexpected results. It would be safer to also pass -DEXECUTORCH_ROOT=... and the critical flags that the initial configure used.

4. Potential unbound variable in fvp_utils.sh fix

backends/arm/scripts/run_fvp.sh:154,167: The fix from "${extra_args_u55[@]}" to ${extra_args_u55[@]+"${extra_args_u55[@]}"} is the correct set -u safe idiom for empty arrays in older Bash versions. Good fix.

5. MAX_KERNEL_NUM hardcoded to 2000

examples/arm/executor_runner/standalone/CMakeLists.txt:125-131:

if(NOT DEFINED CACHE{MAX_KERNEL_NUM} AND NOT DEFINED MAX_KERNEL_NUM)
  set(MAX_KERNEL_NUM 2000 ...)
endif()

This magic number is quite high and may waste memory on constrained targets. Consider adding a brief inline comment explaining why 2000 (e.g., "conservative default to cover quantized + cortex-m + portable op registrations").

6. The redirect in executor_runner/CMakeLists.txt is a UX improvement

examples/arm/executor_runner/CMakeLists.txt:6-13: The FATAL_ERROR guiding users to standalone/ is a great touch for discoverability.

7. Install guard for ethosu_core_driver is duplicated

The ethosu_core_driver install logic with the ET_ETHOSU_CORE_DRIVER_EXPORTED global property guard appears in both:

  • backends/arm/CMakeLists.txt:116-128 (new)
  • examples/arm/executor_runner/CMakeLists.txt:214-228

This is intentional (one handles the sub-project case, the other handles standalone), but the guard ensures idempotency correctly.

8. locate_runner_binary uses find as fallback

examples/arm/run.sh:529:

found=$(find "${build_dir}" -name "${binary_name}" -type f 2>/dev/null | head -n 1 || true)

In large build trees this could be slow or return unexpected matches (e.g., from a CMake fetchcontent directory). The explicit candidate list above should cover all realistic cases, making the find fallback defensive-only. Consider logging a warning if the fallback is hit so developers know their output layout is non-standard.

9. cortex_m/CMakeLists.txt — install directory exposure

backends/cortex_m/CMakeLists.txt:54-72 (new):

install(DIRECTORY "${cmsis_nn_include_dir}/"
        DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/cmsis-nn"
)

This installs CMSIS-NN headers into the install tree. When CMAKE_SKIP_INSTALL_RULES is ON (standalone sub-project mode), this won't fire. But in the bare-metal preset with SKIP_INSTALL=ON → CMAKE_SKIP_INSTALL_RULES=OFF, this will install headers. Make sure the install prefix is still local (it is — CMAKE_INSTALL_PREFIX is set to CMAKE_BINARY_DIR in that case).


Minor Nits

  • examples/arm/run.sh:77: The typo "BundelIO" appears in the help text (should be "BundleIO"). This is pre-existing.
  • examples/arm/run.sh:153: "ERROR: Placing the PTE in memory failed, address is larger then 64bit" — "then" → "than". Pre-existing.
  • backends/arm/scripts/build_executor_runner.sh:41: Help text still says "pte file (generated by the aot_arm_compier)" — typo "compier" → "compiler". Pre-existing.

Summary

The architecture is clean and the cache validation approach is well thought out. The main concerns are around the double-negative naming of EXECUTORCH_BAREMETAL_SKIP_INSTALL (echoing @digantdesai) and the growing size of run.sh. No blocking issues found — LGTM with the naming consideration as a follow-up.


@digantdesai
Copy link
Copy Markdown
Contributor

Thanks for this. I just skimmed it its huge, asked Claude to review as well.

@digantdesai
Copy link
Copy Markdown
Contributor

Pulling in for an internal CI run. If its green we can ,erge it. OSS CI seems to have unrelated failures.

@digantdesai
Copy link
Copy Markdown
Contributor

Also dropping #17814?

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 13, 2026

@digantdesai has imported this pull request. If you are a Meta employee, you can view this in D104931966.

@usamahz usamahz force-pushed the feature/standalone-runner branch from 949526a to cc0e457 Compare May 13, 2026 08:52
usamahz added 10 commits May 13, 2026 09:58
- Detect standalone invocation, derive EXECUTORCH_ROOT, and expose
  ARM_EXECUTOR_RUNNER_STANDALONE for diagnostics.
- Load ExecuTorch presets and add_subdirectory(EXECUTORCH_ROOT ...) when
  building out of tree.
- Refresh pte_to_header.py with a shebang and updated Arm copyright.

Change-Id: I4582326c72a0b571c495aca64b2c58e45bfbb5be
Signed-off-by: Usamah Zaheer <[email protected]>
- Auto-detect Python and corstone helpers so standalone builds mirror
  setup.sh and run.sh.
- Reuse the in-tree Ethos-U core driver unless a custom path is supplied
  and optionally fetch the SDK into arm-scratch.
- Validate delegate prerequisites and enforce ET_PTE_FILE_PATH behavior.
- Halt when no PTE or semihosting mode is provided.

Change-Id: Iadd5dcd5e1a12dca7a00117c7778e9580364294a
Signed-off-by: Usamah Zaheer <[email protected]>
- Call gen_oplist.py through the configured Python interpreter only when
  a model PTE exists.
- Reference CMAKE_CURRENT_SOURCE_DIR for generated headers and linker
  scripts so out-of-tree builds resolve paths correctly.
- Normalize runner outputs and sanitizer helpers so the standalone build
  mirrors the superbuild.
- When BundleIO reuses a separate ExecuTorch build tree where
  bundled_program is not part of this CMake graph, restrict the fallback
  lookup to caller-provided build directories so the runner does not pick
  up an unrelated host library.

Change-Id: I9932d8d7434e8a834b21ac9bbf290361d7ec117b
Signed-off-by: Usamah Zaheer <[email protected]>
- Honor EXECUTORCH_BAREMETAL_SKIP_INSTALL so embedders can disable
  install() rules.
- Propagate Ethos-U delegate includes, install the core driver when
  available, and copy CMSIS-NN headers for downstream toolchains.
- Route the arm_baremetal preset install output back into the build tree
  to keep standalone builds self-contained.

Change-Id: I84bb6a1ad64a404e10e8ce8897167e595b8b82fa
Signed-off-by: Usamah Zaheer <[email protected]>
- Force EXECUTORCH_BAREMETAL_SKIP_INSTALL=OFF so build_executorch.sh always exports the Arm runner dependencies.
- Stop building the install target on non-musl hosts; the default build target already covers what run.sh needs and avoids redundant installs.

Change-Id: Iecd91e4a3eb275ca67ce6593ebfb06d3d7ec42ef
Signed-off-by: Usamah Zaheer <[email protected]>
- Clarify help text for select_ops_list, toolchain choices, and add
  --build-dir reuse.
- Track whether select_ops_list was overridden, allow arbitrary cmake -D
  flags, and tidy scratch or toolchain warnings.
- Plumb the new option state through the control flow to prepare for
  automation.

Change-Id: I69b027e726eee0b23206e7e3c836db375a8bf5b6
Signed-off-by: Usamah Zaheer <[email protected]>
- Auto-derive arm_executor_runner build directories when --build-dir is
  omitted and configure them with the arm_baremetal preset.
- Add validation helpers that ensure standalone builds were configured
  with the right targets, toolchains, and BundledIO/devtools toggles.
- Teach the script to stage PTEs, reuse multi-config build trees, and
  drive FVP/BundleIO workflows from a single entry point.

Change-Id: If52327a1bc512c87fd2ce5d9ce89c352919fd447
Signed-off-by: Usamah Zaheer <[email protected]>
- Explain the auto-configured runner build flow and scratch directory
  expectations in examples/arm/README.md.
- Update the Ethos-U notebook to export EXECUTORCH_ROOT before calling
  standalone cmake.

Change-Id: If9f4f456c03b7a36a27ffdd1dfd1873ec286d07b
Signed-off-by: Usamah Zaheer <[email protected]>
Allow VGF host runner builds to reuse existing top-level CMake build
directories without requiring the bare-metal standalone marker.

Pin the standalone Arm runner registry size to the default capacity
unless the user overrides MAX_KERNEL_NUM. This prevents selected-op
cache sizing from undersizing binaries that also link quantized and
Cortex-M registration libraries.

Change-Id: I6716c454ec5d9d3adbff756afc14fe8739268520
Signed-off-by: Usamah Zaheer <[email protected]>
Update generated Ethos-U docs and docgen templates to point users at
the standalone Arm executor runner CMake entry point.

This replaces the old two-step install and direct runner configure flow.

Signed-off-by: Usamah Zaheer <[email protected]>
Change-Id: I582b87033c7d50a4219fc01a01f1b5ddd980e8e4
@usamahz usamahz force-pushed the feature/standalone-runner branch from cc0e457 to a05bd23 Compare May 13, 2026 08:59
@usamahz usamahz changed the title Arm backend: Allow Arm executor_runner CMake to run standalone Arm backend: Modernise standalone executor runner builds May 13, 2026
@usamahz usamahz changed the title Arm backend: Modernise standalone executor runner builds Arm backend: Modernise and standalone the executor runner May 13, 2026
@usamahz usamahz force-pushed the feature/standalone-runner branch from a05bd23 to 5b82dc5 Compare May 13, 2026 11:13
TOSA-only run.sh invocations stop after AOT export and do not build or
run an Arm executor runner, so avoid requiring the bare-metal toolchain
for those targets.

The Cortex-M E2E CI wrapper can also invoke run.sh fallback setup, so
accept the FVP EULA in that CI caller instead of making run.sh infer
acceptance from CI=true.

Signed-off-by: Usamah Zaheer <[email protected]>
Change-Id: Ic154c5dc6327ee7d882429f11f82fa9c8d7a17e1
@usamahz usamahz force-pushed the feature/standalone-runner branch from 5b82dc5 to c626207 Compare May 13, 2026 12:17
@usamahz
Copy link
Copy Markdown
Collaborator Author

usamahz commented May 13, 2026

Will merge after Meta's internal CI passes

cc: @digantdesai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. help wanted Extra attention is needed module: arm Issues related to arm backend release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants